docs(deepnsm): psychometric validation framework + vertical HHTL bundling spec#39
Merged
Merged
Conversation
…231 lines)
The expansion that was deferred since session start. Adds:
Evaluation types (transcoded from Python nsm_evaluation.py + prompts.py):
- Prediction: grader output with logprob, rank, match status
- SubstitutabilityScore: per-grader scoring with minimality + entailment deltas
- Explication: NSM paraphrase with legality_score() (primes/molecules/circularity)
+ calculate_averages() + get_truncated()
- AmbiguousExample: masked passage with get_truncated() (removes non-UNK sentences)
- ModelResult: aggregated evaluation across all explications
Static sets via LazyLock (Rust 1.94):
- NSM_PRIMES_SET: 78 primes including multi-word ("a long time", "don't want")
- STOP_WORDS: English stopwords minus NSM primes (one-time filtered)
- is_nsm_prime(), is_stop_word(), LEGAL_PUNCTUATION
CAM-PQ bridge:
- load_nsm_codebook(): codebook_pq.bin → CamCodebook (96KB, [6][256][16] f32)
- load_cam_codes(): cam_codes.bin → Vec<CamFingerprint> (5050 × 6 bytes)
36-bit SPO triple:
- SpoTriple: 12-bit subject + predicate + object packed in u64
- new(), subject(), predicate(), object()
Prompt templates + builders:
- NSM_EXPLICATION_SYS_INST, RECOVERY_PROMPT_SYS_INST
- build_explication_prompt() with few-shot support
- build_recover_prompt() with optional explication hint
23 tests passing (12 original + 11 new).
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
All consumer code uses crate::simd only. Zero raw intrinsics.
LazyLock dispatch table selects AVX-512 vs AVX2 at startup.
cam_pq.rs — squared_l2():
- Called 1,536× per CAM-PQ query (6 subspaces × 256 centroids)
- Was: scalar iter().zip().map().sum()
- Now: F32x16 for 16D subvectors (one SIMD lane = one subspace dimension)
- Fast path: n==16 → single load-subtract-multiply-reduce
- Medium path: n>=16 → chunked F32x16 with mul_add + scalar remainder
- Estimated 16× speedup on hot path
deepnsm.rs — nsm_decompose() normalization:
- Was: scalar iter().sum() + scalar /= loop
- Now: F32x16 accumulation (4×16=64 elements) + scalar remainder (10)
- Normalize via F32x16 * splat(1/sum) + scalar tail
deepnsm.rs — nsm_to_fingerprint() XOR:
- Was: scalar for j in 0..1250 { result[j] ^= pattern[j] }
- Now: U8x64 XOR (19×64=1216 bytes) + scalar remainder (34 bytes)
- 64 bytes per SIMD operation vs 1 byte scalar
deepnsm.rs — nsm_similarity() cosine:
- Was: scalar 3-accumulator loop over 74 elements
- Now: F32x16 with mul_add for dot/mag_a/mag_b (4×16=64) + scalar tail (10)
- Three reductions in one pass
23 deepnsm tests + 7 dispatch tests passing. Zero regressions.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Document the category-padded SoA layout (16 categories × 16 slots = 256 F32x16 lanes) as a future optimization concept in deepnsm.rs. Verified no overlap with existing patterns: - blasgraph CSR/CSC: graph adjacency matrix, not semantic vectors - SPO semiring: cost algebra, not vector layout - neighborhood CLAM: search scope, not decomposition format - aabb/spatial_hash SoA: spatial coords (x,y,z), not semantic categories - dn_tree SoA: HV summary layout, not NSM category decomposition The concept is clean for future implementation. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
…ling spec
Two architectural concepts saved for dedicated implementation sessions:
1. Psychometric validation for DeepNSM measurement instrument:
- Cronbach's α across 128 projections (2³ SPO × 2⁴ HHTL)
- Split-half reliability: Strategy A vs Strategy B distance
- IRT item parameters: per-word difficulty + discrimination
- Factor analysis: do 74 primes factor into 16 NsmCategory?
- Construct/convergent/discriminant validity across codec chain
- Polysemy detection via α drop across projections
- P-values with 128 independent measurements per pair
2. Vertical HHTL bundling (studio mixing analogy):
- Leaves → bundle → Twigs → bundle → Branches → bundle → Hip
- Each level = majority vote denoising (background noise removal)
- Unbind bottom-up to verify reconstruction (information loss audit)
- Combined SPO × HHTL = 128-way factorial decomposition
- Cascade as psychometric filter: discrimination, factor analysis,
composite reliability, SEM, residual analysis
Key insight: NARS confidence IS measurement reliability (formalized).
Every similarity judgment gets a confidence interval backed by
128 independent projection measurements.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
docs(deepnsm): psychometric validation framework + vertical HHTL bundling spec
Two architectural concepts saved for dedicated implementation sessions:
Psychometric validation for DeepNSM measurement instrument:
Vertical HHTL bundling (studio mixing analogy):
composite reliability, SEM, residual analysis
Key insight: NARS confidence IS measurement reliability (formalized).
Every similarity judgment gets a confidence interval backed by
128 independent projection measurements.
https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7